272        Bioinformatics

BioProject by the above accession number or simply copy and paste the following URL on

the Internet browser:

https://www.ncbi.nlm.nih.gov/sra/?term=PRJEB24421

Then, use “Send to” dropdown menu to download the runinfo text file. After download-

ing the text file, open the file in Excel, delete all columns except the column with the run

accessions and remove the column name as well, and save the file as “runids.txt” in the

“data” subdirectory.

Instead of the above, you can also use the following EDirect script, which extracts the

run accessions and stores them in a file named “runids.txt” in “data” subdirectory (you

should have the NCBI Entrez Direct installed):

esearch -db sra -query ‘PRJEB24421[bioproject]’ \

| efetch -format runinfo \

| cut -f1 -d, > data/runids.txt

sed -i ‘/^$/d’ data/runids.txt

sed -i ‘/^Run/d’ data/runids.txt

Check to see if the file has been saved successfully by using “ls data/” command or you can

display the file content by using “vim data/runids.txt” command.

After saving the text file with the 86 run accessions in the “data/runids.txt” file, you

can then download the raw FASTQ files from the NCBI SRA database either by saving

the following script in a bash file “download.sh” and then run it as “bash download.sh” or

you can just enter the script on the terminal command-line prompt, while you are in the

project directory:

while read f;

do

fasterq-dump --progress --outdir data “$f”

done < data/runids.txt

You will see the downloading progress. The files require only 771.29MB of storage space.

The 172 FASTQ files will be downloaded in the “data” subdirectory, two files for each

sample. When the files have been downloaded successfully, you can check the content

of the “data” subdirectory and count the number of the FASTQ files using the following

command:

ls data/*.fastq | wc -l

The number of files should be 172. If it is not, you may need to run the download script

again.

7.3.3.2  Creating the Sample Metadata File

Open the NCBI SRA using the above URL. Then, open Run Selector from “Send to” drop-

down menu. All runs will be displayed on the Run Selector. Click “Metadata” button